Edit Distance with Move Operations
نویسندگان
چکیده
The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of a string represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O(1) time, not only can characters be inserted or deleted, but substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O(1) time substring move operation implies O(1) substring exchange operation as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, and that a “recursive” sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence. Although a greedy algorithm is known to have poor (a polynomial factor) worst case performance, we present a polynomial time greedy algorithm for non-recursive moves which on a sub class of instances of a problem of size n achieves an approximation factor to optimal of at most O(log n). The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves of characters to only inserts and deletes of characters.
منابع مشابه
Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable
The string correction problem looks at minimal ways to modify one string into another using fixed operations, such as for example inserting a symbol, deleting a symbol and interchanging the positions of two symbols (a “swap”). This has been generalized to trees in various ways, but unfortunately having operations to insert/delete nodes in the tree and operations that move subtrees, such as a “s...
متن کاملOptimizing Textual Entailment Recognition Using Particle Swarm Optimization
This paper introduces a new method to improve tree edit distance approach to textual entailment recognition, using particle swarm optimization. Currently, one of the main constraints of recognizing textual entailment using tree edit distance is to tune the cost of edit operations, which is a difficult and challenging task in dealing with the entailment problem and datasets. We tried to estimate...
متن کاملAn Eecient Uniform-cost Normalized Edit Distance Algorithm
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of edit operations which are of three types: insertion, deletion, and substitution of symbols. The model assumes a given weight function which assigns a non-negative real cost to each of these edit operations. The amortized weight for a given ...
متن کاملTree Edit Distance, Alignment Distance and Inclusion
We survey the problem of comparing labeled trees based on simple local operations of deleting, inserting and relabeling nodes. These operations lead to the tree edit distance, alignment distance and inclusion problem. For each problem we review the results available and present, in detail, one or more of the central algorithms for solving the problem.
متن کاملA survey on tree edit distance and related problems
We survey the problem of comparing labeled trees based on simple local operations of deleting, inserting, and relabeling nodes. These operations lead to the tree edit distance, alignment distance, and inclusion problem. For each problem we review the results available and present, in detail, one or more of the central algorithms for solving the problem. keywords tree matching, edit distance
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002